Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

نویسندگان

  • Mauricio Santillana
  • André T. Nguyen
  • Mark Dredze
  • Michael J. Paul
  • Elaine O. Nsoesie
  • John S. Brownstein
چکیده

We present a machine learning-based methodology capable of providing real-time ("nowcast") and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013-2014 (retrospective) and 2014-2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods Using Social Media and Search Queries to Predict Infectious Disease Outbreaks

Objectives For earlier detection of infectious disease outbreaks, a digital syndromic surveillance system based on search queries or social media should be utilized. By using real-time data sources, a digital syndromic surveillance system can overcome the limitation of time-delay in traditional surveillance systems. Here, we introduce an approach to develop such a digital surveillance system. ...

متن کامل

Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation

BACKGROUND The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of ...

متن کامل

Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study

BACKGROUND Public health officials and policy makers in the United States expend significant resources at the national, state, county, and city levels to measure the rate of influenza infection. These individuals rely on influenza infection rate information to make important decisions during the course of an influenza season driving vaccination campaigns, clinical guidelines, and medical staffi...

متن کامل

Using Participatory Web-based Surveillance Data to Improve Seasonal Influenza Forecasting in Italy

Traditional surveillance of seasonal influenza is generally affected by reporting lags of at least one week and by continuous revisions of the numbers initially released. As a consequence, influenza forecasts are often limited by the time required to collect new and accurate data. On the other hand, the availability of novel data streams for disease detection can help in overcoming these issues...

متن کامل

Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea

BACKGROUND As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2015